1,577 research outputs found
An End-to-End Trainable Neural Network Model with Belief Tracking for Task-Oriented Dialog
We present a novel end-to-end trainable neural network model for
task-oriented dialog systems. The model is able to track dialog state, issue
API calls to knowledge base (KB), and incorporate structured KB query results
into system responses to successfully complete task-oriented dialogs. The
proposed model produces well-structured system responses by jointly learning
belief tracking and KB result processing conditioning on the dialog history. We
evaluate the model in a restaurant search domain using a dataset that is
converted from the second Dialog State Tracking Challenge (DSTC2) corpus.
Experiment results show that the proposed model can robustly track dialog state
given the dialog history. Moreover, our model demonstrates promising results in
producing appropriate system responses, outperforming prior end-to-end
trainable neural network models using per-response accuracy evaluation metrics.Comment: Published at Interspeech 201
AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content Analysis
Recently, sound recognition has been used to identify sounds, such as car and
river. However, sounds have nuances that may be better described by
adjective-noun pairs such as slow car, and verb-noun pairs such as flying
insects, which are under explored. Therefore, in this work we investigate the
relation between audio content and both adjective-noun pairs and verb-noun
pairs. Due to the lack of datasets with these kinds of annotations, we
collected and processed the AudioPairBank corpus consisting of a combined total
of 1,123 pairs and over 33,000 audio files. One contribution is the previously
unavailable documentation of the challenges and implications of collecting
audio recordings with these type of labels. A second contribution is to show
the degree of correlation between the audio content and the labels through
sound recognition experiments, which yielded results of 70% accuracy, hence
also providing a performance benchmark. The results and study in this paper
encourage further exploration of the nuances in audio and are meant to
complement similar research performed on images and text in multimedia
analysis.Comment: This paper is a revised version of "AudioSentibank: Large-scale
Semantic Ontology of Acoustic Concepts for Audio Content Analysis
- …